---
title: Migrating PXF from Greenplum 5.x to 6.x
---

> **Note** PXF also supports `gpupgrade`-based Greenplum upgrade in additional to the manual migration described in this topic.

If you are using PXF in your Greenplum Database 5.x installation, upgrade Greenplum Database to version **5.21.2** or newer before you migrate PXF to Greenplum Database **6.10.1** or newer.

The PXF Greenplum Database 5.x to 6.x migration procedure has two parts. You perform one PXF procedure in your Greenplum Database 5.x installation, then install, configure, and migrate data to Greenplum 6.x:

-   [Step 1: Complete PXF Greenplum Database 5.x Pre-Migration Actions](#pxf5pre)
-   [Step 2: Migrate PXF to Greenplum Database 6.x](#pxfmig)


## <a id="prerequisites"></a>Prerequisites

Before migrating PXF from Greenplum 5.x to Greenplum 6.x, ensure that you can:

- Identify your OS Version. Greenplum Database 6.x does not support running PXF on CentOS 6.x or RHEL 6.x due to a limitation with the version of `cURL`. See [Known Issues and Limitations](https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/relnotes-release-notes.html#known-issues-and-limitations) in the Greenplum Database release notes for more details.
- Identify the version number of your Greenplum 5.x installation. If it is older than 5.21.2, upgrade to a newer 5.x version.
- Identify the version of PXF running in the Greenplum 5.x cluster.
- Identify the location of your PXF installation:

    - If you installed PXF from a separate `rpm` or `deb`, the PXF install location is `/usr/local/pxf-gp<greenplum-major-version>`.
    - If you are running the PXF bundled in the Greenplum Database 5.x Server installation, the PXF install location is `$GPHOME/pxf`.
- Determine if you have `gphdfs` external tables defined in your Greenplum 5.x installation.
- Identify the file system location of the PXF `$PXF_CONF` directory in your Greenplum 5.x installation. (If you are unsure of the location, you can find the value in `pxf-env-default.sh`.) In Greenplum 5.15 and later, `$PXF_CONF` identifies the user configuration directory that was provided to the `pxf cluster init` command. This directory contains PXF server configurations, security keytab files, and log files.


## <a id="pxf5pre"></a>Step 1: Complete PXF Greenplum Database 5.x Pre-Migration Actions

Perform this procedure in your Greenplum Database 5.x installation:

1. Log in to the Greenplum Database coordinator node. For example:

    ``` shell
    $ ssh gpadmin@<gp5coordinator>
    ```

2. Identify and note the Greenplum Database version number of your 5.x installation. For example:

    ``` shell
    gpadmin@gp5coordinator$ psql -d postgres
    ```

    ``` sql
    SELECT version();

               version 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.3.23 (Greenplum Database 5.21.2 build commit:610b6d777436fe4a281a371cae85ac40f01f4f5e) on x86_64-pc-linux-gnu, compiled by GCC gcc (GCC) 6.2.0, 64-bit compiled on Aug  7 2019 20:38:47
(1 row)
    ```

2. Identify and note the version number of PXF running in your Greenplum 5.x installation. For example:

    ``` shell
    gpadmin@gp5coordinator$ pxf version
    ```

3. Greenplum 6 removes the `gphdfs` external table protocol. If you have `gphdfs` external tables defined in your Greenplum 5.x installation, you must delete or migrate them to `pxf` as described in [Migrating gphdfs External Tables to PXF](gphdfs-pxf-migrate.html).

3. Stop PXF on each segment host as described in [Stopping PXF](https://docs.vmware.com/en/VMware-Greenplum-Platform-Extension-Framework/6.6/greenplum-platform-extension-framework/cfginitstart_pxf.html#stop_pxf).

4. If you plan to install Greenplum Database 6.x on a new set of hosts, be sure to save a copy of the `$PXF_CONF` directory in your Greenplum 5.x installation.

5. Install and configure Greenplum Database 6.x, migrate Greenplum 5.x table definitions and data to your Greenplum 6.x installation, and then continue your PXF migration with [Step 2: Migrating PXF to Greenplum 6.x](#pxfmig).


## <a id="pxfmig"></a>Step 2: Migrating PXF to Greenplum 6.x

After you upgrade to Greenplum Database 6.x, and table definitions and data from the Greenplum 5.x installation have been migrated or upgraded, perform the following procedure to configure and start the new PXF software:

1. Log in to the Greenplum Database 6.x coordinator node. For example:

    ``` shell
    $ ssh gpadmin@<gp7coordinator>
    ```

2. Identify and note the version number of your Greenplum Database 6.x installation.

3. PXF releases different software packages for Greenplum Database version 5 and version 6. 
    1. If you are running version 6.x of the independent PXF distribution that you installed via an `rpm` or `deb` in your Greenplum 5.x installation:
        1. Install the same PXF 6.x version *for Greenplum 6* on your Greenplum 6.x hosts as described in [Installing PXF](https://docs.vmware.com/en/VMware-Greenplum-Platform-Extension-Framework/6.6/greenplum-platform-extension-framework/installing_pxf.html).
        1. Copy the PXF extension control file from the PXF installation directory to the new Greenplum 6.x install directory:

            ``` shell
            gpadmin@gp7coordinator$ pxf cluster register
            ```
        1. Start PXF on each host:

            ``` shell
            gpadmin@gp7coordinator$ pxf cluster start
            ```
        1. Skip the following steps and exit this procedure.

    1. If you are running PXF version 5.x, you must install the latest independent PXF 5.16.x distribution for Greenplum 6 on your Greenplum 6.x hosts as described in [Installing PXF](https://docs.vmware.com/en/VMware-Greenplum-Platform-Extension-Framework/6.6/greenplum-platform-extension-framework/installing_pxf.html).

        <div class="note"><b>Note:</b> If you are interested in running PXF version 6.x, upgrade to that version <b>after</b> you complete this entire procedure and verify that PXF is working in your Greenplum 6.x installation.</div>

1. Identify and note the PXF (to) version number. For example:

    ``` shell
    gpadmin@gp7coordinator$ pxf version
    ```

2. If you installed Greenplum Database 6.x on a new set of hosts, copy the `$PXF_CONF` directory from your Greenplum 5.x installation to the coordinator node. Consider copying the directory to the same file system location at which it resided in the 5.x cluster. For example, if `PXF_CONF=/usr/local/greenplum-pxf`:

    ``` shell
    gpadmin@gp7coordinator$ scp -r gpadmin@<gp5coordinator>:/usr/local/greenplum-pxf /usr/local/
    ```

3. Initialize PXF on each segment host as described in [Initializing PXF](https://docs.vmware.com/en/VMware-Greenplum-Platform-Extension-Framework/6.6/greenplum-platform-extension-framework/init_pxf.html), specifying the `PXF_CONF` directory that you copied in the step above.

5. **If you are migrating PXF from Greenplum Database version 5.23 or earlier** and you have configured any JDBC servers that access Kerberos-secured Hive, you must now set the `hadoop.security.authentication` property in the `jdbc-site.xml` file to explicitly identify use of the Kerberos authentication method. Perform the following for each of these server configs:

    1. Navigate to the server configuration directory.
    2. Open the `jdbc-site.xml` file in the editor of your choice and uncomment or add the following property block to the file:

        ```xml
        <property>
            <name>hadoop.security.authentication</name>
            <value>kerberos</value>
        </property>
        ```
    3. Save the file and exit the editor.

5. **If you are migrating PXF from Greenplum Database version 5.26 or earlier**: The PXF `Hive` and `HiveRC` profiles now support column projection using column name-based mapping. If you have any existing PXF external tables that specify one of these profiles, and the external table relied on column index-based mapping, you may be required to drop and recreate the tables:

    1. Identify all PXF external tables that you created that specify a `Hive` or `HiveRC` profile.

    2. For *each* external table that you identify in step 1, examine the definitions of both the PXF external table and the referenced Hive table. If the column names of the PXF external table *do not* match the column names of the Hive table:

        1. Drop the existing PXF external table. For example:

            ``` sql
            DROP EXTERNAL TABLE pxf_hive_table1;
            ```

        2. Recreate the PXF external table using the Hive column names. For example:

            ``` sql
            CREATE EXTERNAL TABLE pxf_hive_table1( hivecolname int, hivecolname2 text )
              LOCATION( 'pxf://default.hive_table_name?PROFILE=Hive')
            FORMAT 'custom' (FORMATTER='pxfwritable_import');
            ```

        3. Review any SQL scripts that you may have created that reference the PXF external table, and update column names if required.

5. Synchronize the PXF configuration from the Greenplum Database 6.x coordinator host to the standby coordinator and each segment host in the cluster. For example:

    ``` shell
    gpadmin@gp7coordinator$ pxf cluster sync
    ```
 
6. Start PXF on each Greenplum Database 6.x segment host:

    ``` shell
    gpadmin@gp7coordinator$ pxf cluster start
    ```

    Your Greenplum Database cluster is now running version 5.16.x of PXF, and running it from the PXF installation directory (`/usr/local/pxf-gp<greenplum-major-version>`). Should you wish to upgrade PXF in the future, consult the [PXF upgrade documentation](https://docs.vmware.com/en/VMware-Greenplum-Platform-Extension-Framework/6.6/greenplum-platform-extension-framework/upgrade_landing.html).

7. Verify the migration by testing that each PXF external table can access the referenced data store.

